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Abstract 

In this correspondence, we will point out a problem with testing adaptive classifiers on 
autocorrelated data. In such a case random change alarms may boost the accuracy figures. 
Hence, we cannot be sure if the adaptation is working well. 
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Testing on the Electricity data 



Concept drift (Widmer and Kubat, 1996) has become a popular research topic over the last 
decade and a lot of adaptive classification algorithms have been developed. The setting is 
as follows. Multidimensional input data is arriving over time, the goal is to predict the 
class label y for each instance X. In the stationary settings a classifier C : X — > y is trained 
once and remains fixed during the operation. In the concept drift scenario the joint data 
distribution, i.e. p(X,y), may change over time, and, as a result, a fixed predictor C may 
lose accuracy over time. In the concept drift scenario a classifier can be updated at every 
time step taking into account the most recent data: Ct = f(Ct-i,Xt-i,yt-i), where t is 
the time step, / is the adaptation function. 



The Electricity dataset due to Harries ( 1999 ) is a popular benchmark for testing adaptive 
classifiers^ It has been used in over 40 concept drift experiments^} The dataset covers 
a period of two years (45 312 instances recorded every half an hour, 6 input variables). A 
binary classification task is to predict a rise (UP) or a fall (DOWN) in the electricity price in 
New South Wales (Australia). The prior probability of DOWN is 58%. The data is subject 
to concept drift due to changing consumption habits, unexpected events and seasonality. 

This dataset has an important property not to be ignored when evaluating concept 
drift adaptation. Suppose we employ a naive predictor that predicts the next label to be 
the same as the current label (the moving average of one). For instance, if the price goes 
UP now, it predict that the next time step the price will go UP as well. If the data was 
distributed independently, such a predictor would achieve 51% accuracjj^J However, if we 
test this naive approach on the Electricity dataset it gives much higher 85% accuracy. This 
happens because the labels are not independent; there are long consecutive periods of UP 
and long consecutive periods of DOWN. Figure [l] plots the autocorrelation function^] of the 
labels. 



1. The dataset is available from e.g. http://www.liaad.up.pt/~jgama/ales/ales_5.html 

2. Google scholar, 2013 January 

3. The probability that two labels in a row are the same is p(UP) 2 + p(DOWN) 2 . 

4. Autocorrelation peaks at every 48 instances (24 hours) due to the cylices of electricity consumption. 
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Figure 1: Autocorrelation function of the 
Electricity labels. 



Figure 2: Naive classification accuracies 
on the Electricity dataset. 



The problem with evaluation of adaptive classifiers on such a dataset is that we cannot 
be sure if a change detector (and adaptation) is working well. Suppose we have a classifier 
with a worthless change detection mechanism. If fires a change alarm after any instance 
at random with the probability p. After firing an alarm the classifier is restarted and 
continues training on the most recent data. Suppose we do not take into consideration any 
input data, we do not build any intelligent models just look at the labels. If p = 0, i.e. no 
change detection, we get the majority class (always DOWN) classifier that would achieve 
58% accuracy over this dataset. If p = 1, we alarm a change as often as possible, we get the 
moving average of one classifier. Figure [2] plots the accuracies in between. Note that if the 
data was distributed independently we would get the naive accuracy 51% independently of 
P- 

In the appendix we report the results of testing several adaptive classifiers implemented 



in MOA (Bifet et al. 2010a) and the accuracies found in the literature on the Electricity 
dataset. 



In summary, the more random change alarms the classifier fires, the better the accu- 
racy. There change alarms are not related in detecting concept drift in any way, we are not 
using the input data X in this experiment. Thus, getting high accuracy on the Electricity 
dataset does not necessarily mean that the adaptation mechanism is good. In such a case 
we recommend at least comparing the testing accuracies with the accuracy of the moving 
average of one. 

This note is intended to be updated. There is a website for discussing this issue or leaving 
your comments https : //sites . google . com/ site/zliobaite/about_electricity, 
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Appendix 

Table [T] reports classification accuracies tested with MOA(Bifet et al. 2010a) implemen- 
tations. We see that only LeveragingBag and AdaHoeffdingOptionTree outperform the 
moving average of one. Table [2] collects classification accuracies on the Electricity dataset 
as reported in published papers. 



Table 1: Accuracies of adaptive classifiers 
Algorithm Accuracy 



on the Electricity datasets tested with MOA. 

Time Memory Reference 



LeveragingBag 88.6 8.83 0.62 

AdaHoeffdingOptionTree 86.7 1.61 0.71 

moving average of one 85.3 

SingleClassifierDrift EDDM 84.9 1.00 0.00 

OzaBagADWIN 84.5 3.81 0.21 

HoeffdingAdaptiveTree 83.6 0.97 0.02 

SingleClassifierDrift DDM 82.7 2.23 

AccuracyUpdatedEnsemble2 77.6 5.04 0.96 

NaiveBayes 74.2 0.30 0.01 

AccuracyUpdatedEnsemblel 72.8 7.25 0.74 

AccuracyWeightedEnsemble 71.1 6.33 0.37 

AccuracyUpdatedEnsemble 70.6 7.36 0.73 

MajorityClass 57.5 0.20 
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Table 2: Accuracies of 
Algorithm 
DDM 

Learn++.CDS 

KNN-SPRT 

GRI 

FISH3 

EDDM-IB1 

moving average of 

ASHT 

bagADWIN 

DWM-NB 

Local detection 

Perceptron 

ADWIN 

Prop, method 

AUE 

Cont. A-perc. 

CALDS 

TA-SVM 

* tested on a subset 



adaptive classifiers on the Electricity dataset reported in literature. 



one 



Accuracy 
89.6* 
88.5 
88.0 
88.0 
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85.7 
85.3 
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79.1 
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68.9 
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